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FURTHER DETAILS ON INFERENCE UNDER RIGHT 
CENSORING FOR TRANSFORMATION MODELS WITH A 
CHANGE-POINT BASED ON A COVARIATE 
^ ■ THRESHOLD 

o 

^ ; By Michael R. KosorokP and Rui SongP 

"^^ ' We consider linear transformation models applied to right cen- 

sored survival data with a change-point in the regression coefficient 
based on a covariate threshold. We establish consistency and weak 
convergence of the nonparametric maximum likelihood estimators. 
^H , The change-point parameter is shown to be n-consistent, while the 

^^ ' remaining parameters are shown to have the expected root-n con- 

sistency. We show that the procedure is adaptive in the sense that 
the non-threshold parameters are estimable with the same precision 
Cy ' as if the true threshold value were known. We also develop Monte- 

Carlo methods of inference for model parameters and score tests for 
the existence of a change-point. A key difficulty here is that some of 
the model parameters are not identifiable under the null hypothesis 
of no change-point. Simulation studies establish the validity of the 
proposed score tests for finite sample sizes. 



> 

o 

Ti^j- ■ 1. Introduction. The linear transformation model states that a con- 

^D . tinuous outcome U, given a d-dimensional covariate vector Z, has the form 

\0 ■ 

p; (1) H{U) = -(3'Z + £, 



where H is an increasing, unknown transformation function, /3 S R*^ are the 
unknown regression parameters of interest, and e has a known distribution 
F. This model is readily applied to a failure time T by letting U = logT 
and H(u) = log^(e"), where A is an unspecified integrated baseline haz- 
k> ! ard. Setting F{s) = 1 — exp(— e*) results in the Cox model, while setting 

^ I F{s) = e^/(l -|- c^) results in the proportional odds model. More generally, 

the transformation model for a survival time T conditionally on a time- 
dependent covariate Z{t) = {Z{s),0 < s <t}, takes the form 

(2) P [r > t|Z(t)] = 5z(t) = A(fe^'^^'UA{i 
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2 M. R. KOSOROK AND R. SONG 

where A is a known decreasing function with A(0) = 1. The model ^ 
becomes model (^ when the covariates are time-independent and F(s) = 
l-A(e^). 

In data analysis, the assumption of linearity of the regression effect in 
is not always satisfied over the whole range of the covariate, and the fit 
may be improved with a two-phase transformation model having a change- 
point at an unknown threshold of a one-dimensional covariate Y. Let Z = 
{Zi, Z2), where Zi and Z2 are possibly time-dependent covariates in M^ and 
W^, respectively, where p + q = d and q > 1- The new model is obtained by 
replacing P'Z{s) in Q with 

(3) r^{s; Z,Y) = (3'Z{s) + [a + r,' Z2{s)]l{Y > C}, 

where a is a scalar, r/ G M"?, 1{-B} is the indicator of B, and ^ denotes the 
collected parameters (a,/3, r/, (^). We also require Y to be time-independent 
but allow it to possibly be one of the covariates in Z{t). The overall goal of 
this paper is to develop methods of inference for this model applied to right 
censored data. 

We note that for the special case when q = and A(t) = e~*, the 
model (jSJ becomes the Cox model considered by .27] under a slightly dif- 
ferent parameterization. Permitting a nonzero a allows the possibility of a 
"bent-line" covariate effect. Suppose, for example, that Z2 is one-dimensional 
and time-independent, while Zi G W^~^ may be time-dependent. If we set 
y = Z2 and /? = (/?l,/32)', where /3i G M'^-^ and /?2 G M, the model (0) be- 
comes r^{s; Z, Y) = (3[Zi{s) + /32^2 + (a + ??^2)1{^2 > C}- When a = -r?C, 
the covariate effect for Z2 consists of two connected linear segments. In many 
biological settings, such a bent-line effect is realistic and can be much easier 
to interpret than a quadratic or more complex nonlinear effect |9|. Hence 
including the intercept term a is useful for applications. 

Linear transformation models of the fori n ([Tl) h ave been widely used and 
studied (see, for example, |2, iJ, 0, 0, E3, E3, EjEi, El ) ■ Efficient methods of 



estimation in the uncensored setting were rigorously studied by |a|, among 
others. The model ((2]) for right-censored data has also been studied rigor- 
ously for a variety of specific choices of A [2^, |25|, |2a, l29l | ; fo r general but 
known A 30]; and for certain parameterized families of A ll/l]. 

Change-point models have also been studied extensively and have proven 
to be popular in clinical research. Several researchers have considered a 
nonregular Cox model involving a two-phase regression on time-dependent 
covariates, with a change-point at an unknown time |l8l. l2Ct l2ll|. As men- 
tioned above, [22| considered the Cox model with a change-point at an un- 
known threshold of a covariate. These authors studied the maximum partial 
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CHANGE-POINT TRANSFORMATION MODELS 3 

likelihood estimators of the parameters and the estimator of the baseline 
hazard function. They show that the estimator of the threshold parame- 
ter is n-consistent, while the regression parameters are -^/n-consistent. This 
happens because the likelihood function is not differentiable with respect 
to the threshold parameter, and hence the usual Taylor expansion is not 
available. In this paper, we focus on the covariate threshold setting. While 
time threshold models are also interesting, we will not pursue them further 
in this paper because the underlying techniques for estimation and inference 
are quite distinct from the covariate threshold setting. 



The contribution of our paper builds on 23] in three important ways. 
Firstly, we extend to general transformation models. This results in a signif- 
icant increase in complexity over the Cox model since estimation of the base- 
line hazard can no longer be avoided through the use of the partial-profile 
likelihood. Secondly, we study nonparametric maximum likelihood inference 
for all model parameters. As part of this, we show that the estimation proce- 
dure is adaptive in the sense that the non-threshold parameters — including 
the infinite-dimensional parameter A — are estimable with the same preci- 
sion as if the true threshold parameter were known. Thirdly, we develop 
hypothesis tests for the existence of a change-point. This is quite challeng- 
ing since some of the model parameters are no longer identifiable under the 
null hypothesis of no change-point. [l| considers similar nonstandard test- 
ing problems when the model is fully parametric and establishes asymptotic 
null and local alternative distributions of a number of likelihood-based test 
procedures. Unfortunately, Andrews' results are not directly applicable to 
our setting because of the presence of an infinite dimensional nuisance pa- 
rameter, the baseline integrated hazard A, and new methods are required. 

The next section, section 2, presents the data and model assumptions. 
The nonparametric maximum log-likelihood estimation (NPMLE) proce- 
dure is presented in section 3. In section 4, we establish the consistency of 
the estimators. Score and information operators of the regular parameters 
are given in section 5. Results on the convergence rates of the estimators are 
established in section 6. Section 7 presents weak convergence results for the 
estimators, including the asymptotic distribution of the change-point estima- 
tor and the asymptotic normality of the other parameters. This section also 
establishes the adaptive semiparametric efficiency mentioned above. Monte 
Carlo inference for the parameters is discussed in section 8. Methods for 
testing the existence of a change-point are then presented in section 9. A 
brief discussion on implementation and a small simulation study evaluating 
the moderate sample size performance of the proposed change-point tests 
are given in section 10. Proofs are given in section 11. 
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4 M. R. KOSOROK AND R. SONG 

2. The data set-up and model assumptions. The data Xi = {Vi,6i, 
Zi,Yi), i = 1, . . . ,n, consists of n i.i.d. realizations of X = (V, 5, Z, Y), where 
V = T A C, 6 = l(r < C), and C is a right censoring time. The analysis 
is restricted to the interval [0, r], where r < oo. The covariate y G M and 
Z = {Z{t),t G [0, r]} is assumed to be a caglad (left-continuous with right- 
hand limits) process with Z{t) = (Z((t), Z^(t))' G M^ x R-?, for ah t G [0,r], 
where q > 1 but p = is allowed. 

We assume that conditionally on Z and Y, the survival function at time 
t has the form: 

(4) Sz,Y(.t) ^ A (^* e'^^-'^'^UA{u)^ , 

where A is a known, thrice differentiable decreasing function with A(0) = 1, 
r^{s; Z,Y) is as defined in Q, and A is an unknown increasing function 
restricted to [0,r]. 

Let G = - log A, and define the derivatives A = dA{t)/{dt), A = dA{t)/{dt), 
G = dG{t)/{dt), G = dG{t)/{dt), and G = dG/{dt). We also define the col- 
lected parameters 7 = {a,ri,(3), Tp = (7, A), and 6 = {ip,C)- We use P to 
denote the true probability measure, while the true parameter values are 
indicated with a subscript 0. 

We now make the following additional assumptions: 

Al : P[G = 0] = 0, P[C > t\Z,Y] = P[C = t\Z,Y] > almost surely, 
and censoring is independent of T given (Z, Y) and uninformative. 

A2 : The total variation of Z{-) on [0, r] is < ttiq < 00 almost surely. 

Bl : Co £ (fli^)) for some known — cxd < a < 6 < cxd with P[Y < a] > 
and P{Y > 6] > 0. 

B2 : For some neighborhood V{Co) of Co^ 

(i) the density of Y, h, exists and is strictly positive, bounded and 
continuous for all y G V'(Co); and 

(ii) the conditional law of {C,Z) given Y = y, Cy, is left-continuous 
with right-hand limits over V{Cq). 

B3 : For some ti,t2 G (0,r], both var[Z(ii)|y = Co] and yar:[Z{t2)\Y = 

Co+] are positive definite. 
B4 : For some t3,ti G (0,t], both var[Z(t3)|y < a] and var[Z{t4,)\Y > b] 

are positive definite. 
CI : ao G T C M, /?o G Si C M'', r/o G ^2 C R'', where d > g > 1, and T, 

Bl and B2 are open, convex, bounded and known. 
C2 : Either ckq 7^ or r/o 7^ 0. 
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CHANGE-POINT TRANSFORMATION MODELS 5 

C3 : Aq £ A, where A is the set of ah increasing functions A : [0, r] i— > 
[0, oo) with A(0) = and A{t) < oo; and Aq has derivative oq satisfy- 
ing < ao(t) < cxo for ah t G [0, r]. 

Dl : G : [0, oo) i-^ [0, oo) is thrice continuously differentiable, with G(0) = 
0, and, for each u £ [0, oo), < G(n), A(n) < oo and supsg[o^„] 1(7(5)1 < 
oo. 

D2 : For some cq > 0, both sup^>o |u'^oA(n)| < oo and sup„>o |n^"'"^"A(n)| < 
oo. 

Conditions Al, A2, CI and C3 are commonly used for NPMLE con- 
sistency and identifiability in right-censored transformation models, while 
conditions Bl, B2, B3 and C2 are needed for change-point identifiability. 
As pointed out by a referee, the use of a time-dependent covariate will re- 
quire that ZiiVj) be observed for each individual i and for every j such that 
5i = 1 and Vj <Vi. While this is often assumed in theoretical contexts, it 
can be unrealistic in practice, where missing values of Zi(t) are not unusual 
(see (l9|). Frequently, data analysts will simply carry the last observation of 
Zi{t) forward to avoid the missingness problem. Unfortunately, this simple 
solution is not necessarily valid. However, addressing this issue thoroughly 
is beyond the scope of this paper, and we will only mention it again briefly 
in section 9, where we develop a test of the null hypothesis that there is 
no change-point {Hq : oq = and ?7o = 0). Also in section 9, we will relax 
condition C2 to allow for a sequence of contiguous alternative hypotheses 
that includes Hq. Condition B2(ii) is also needed to obtain weak convergence 
for the NPMLE of Co- The continuity requirements at each point y can be 
restated in the following way: £^ converges weakly to Cy, as ( 1 y; and £^ 
converges weakly to Cy+, as ( I y, for some law Cy+. It would require a 
fairly pathological relationship among the variables (C, Z, Y) for this not to 
hold. Condition B4 will also be needed for the change-point test developed 
in section 9. 

Conditions Dl and D2 are also needed for asymptotic normality. Con- 
dition Dl is quite similar to conditions (G.l) through (G.4) in [30| who 
use the condition for developing asymptotic theory for transformation mod- 
els without a chan ge-p oint. Condition D2 is slightly weaker than condi- 
tions D2 and D3 of [Tfl] who use the condition to obtain asymptotic theory 
for frailty regression models without a change-point. The following are sev- 
eral instances that satisfy conditions Dl and D2: 

1. A(u) = e~" corresponds to the extreme value distribution and results 
in the Cox model. 

2. A[u) = (1 + cu)^^''^, for any c G (0,oo), corresponds to the family 
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6 M. R. KOSOROK AND R. SONG 

of log-Pareto distributions and results in the odds-rate transformation 
family. Taking the limit as c | yields the Cox model, while c = 1 
yields the proportional odds model. 

3. A.{u) = E e~^" , where VK is a positive frailty with E [M^^*^] < oo, 

for some c > 0, and E [W^] < oo, corresponds to the family of frailty 
transformations. In addition to the odds-rate family, these conditions 
are satisfied by both the inverse Gaussian and log-normal families (see 
|17|). as well as many other frailty families. 

4. A(n) = [H-2cu + n^]~^, where c G (1/2, 1). Because this is the Laplace 
transform of t ^^ e"*^* x sin (t^/l — cP] /yl — c^, it is not the Laplace 
transform of a density. Hence this family is not a member of the family 
of frailty transformations. Note, however, that taking the limit as c | 1 
results in the Laplace transform of the frailty density te~*. 

Verification of these conditions is routine for examples 1, 2 and 4 above, 
but verification for example 3 is slightly more involved: 

Lemma 1. Conditions Dl and D2 are satisfied for example 3 above. 

3. Nonparametric Maximum log-likelihood estimation. The non- 
parametric log-likelihood has the form L„('0,^) = 

(5) F„.{6logia{V))+lt{V,6,Z)l{Y < C} + ltiV,6,Z)l{Y > (}} , 

where 

ltiV,6,Z) ^ £ [log G(Ht is)) +P'Z is)] dN{s)-G{Ht{V)), 

ltiV,6,Z) = r [log G[Hi'{s))+ (3' Z{s) + a + rj'Z2is)] dN{s) 

-G{Ht{V)), 

where N{t) = 1{V < t]5, Y{s) = 1{V > s], a = dA/dt, Hf{t) = 
J^Y{s)e^'^('^dA{s), H^{t) = J^Y{s)e^'^('^+'^+'''^^^'UA{s), and P„ is the 
empirical probabilit y m easure. 

As discussed by J23], the maximum likelihood estimator for a does not 
exist, since any unrestricted maximizer of Q puts mass only at observed 
failure times and is thus not a continuous hazard. We replace a{u) in Ln{i^, C) 
with nAA{u) as suggested in J2j] who remarked that this form of the 
empirical log-likelihood function is asymptotically equal to the true log- 
likelihood function in certain instances. Let L„('0,(^) be this modified log- 
likelihood. Note that the maximum likelihood estimator for (^ is not unique, 

imsart-aos ver. 2006/01/04 file: cl4.tex date: February 2, 2008 



CHANGE-POINT TRANSFORMATION MODELS 7 

since the likelihood is constant in C over the intervals \Y(^r)}Y{r+i)): where 
^(1) < • • • < ^(r) < • • • < y{n) are the order statistics of Y. For this reason, 
we only need to consider C at the values of the Y order statistics. 

The estimators are obtained in the following way: For fixed (", we max- 
imize the fully nonparametric log-likelihood over ip, to obtain the profile 
log-likelihood pLn{C) = sup^ Ln(^,(")- We then maximize pLn{C) over C,, 
to obtain Qn'-, and then compute ^„ = argmax^L„(-(/;, Cn)- This yields the 
NPMLE On = {ipn, Cn) for 9q. Hence we obtain an estimator for ^o but not 
for oq. 

4. Consistency. To study consistency, we first characterize the NPMLE 
9n- Consider the following one-dimensional submodels for A: 



t^At= f {l + tg{s))dA{s), 
Jo 



where g is an arbitrary non-negative bounded function. A score function for 
A, defined as the derivative of Ln{C, At) with respect to t at t = 0, is 



{6)Fnl5giX) 



^ ^ '' G{H^{V)) 



Y{s)e'-i^''^^^^g{s)dA{s) 



where H^{t) = /g y(s)e''«('''^'^)(iA(s). For any fixed ^, let ig denote the 
maximizer of ^ i-^ Ln{£,, A), and let 0^ = (^, A^). Then the score function © 
is equal to zero when evaluated at 9^. We select g{u) = l{ii < t}, insert this 
into (inj, and equate the resulting expression to zero: A^{u) = 




\ ^ J g{h'^{v)} 



-1 

Fn{dN{s)} 



{FnW{s;9^)}-'Fn{dN{s)} 



Now the profile likelihood has the iorui pLn{C) = argmax^L„ f (7, A(^^^)), ( 
The above characterization facilitates the following consistency results for 



Lemma 2. Under the regularity conditions of section 2, the transfor- 
mation model with a change-point based on a covariate threshold is identifi- 
able. 
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8 M. R. KOSOROK AND R. SONG 

Lemma 3. Under the regularity conditions of section 2, An is asymp- 
totically hounded, and thus the NPMLE On exists. 

Using these results, we can establish the uniform consistency of On'- 

Theorem 1. Under the regularity conditions of section 2, On con- 
verges outer almost surely to Oq in the uniform norm. 

5. Score and information operators for regular parameters. In 

this section, we derive the score and information operators for the collected 
parameters ip. We refer to these parameters as the regular parameters be- 
cause, as we will see in section 6, these parameters converge at the ^/n rate. 
On the other hand, C„ converges at the n rate and thus the parameter (^ is 
not regular. The score and information operators for ^ are needed for the 
convergence rate and weak limit results of sections 6 and 7. 

Let 7i denote the space of the elements h = (/ii, /i2, /is, ^4) such that 
hi G M, /i2 G M^, /i3 G W^, and h^ G D[0,t], where D[0,t] is the space of 
cadlag functions (right-continuous with left-hand limits) on [0, r]. We denote 
by BV the subspace of D[0, r] consisting of functions that are of bounded 
variation over the interval [0,r]. Define, for future use, the following linear 
functional for each = {ip, C) and each t G [0, r]: 

(8) Rl4f) ^ J^ f{u)Y{u)e^dn-,^^''UA{u), 

where / is an element or vector of elements in BV. Also let pi{h) = (|/iip + 
||/i2|P + ll/isiP + ||/i4||^)^/^ and nr = {heH: pi{h) < r}, where || • H^, is the 
total variation norm on BV and r G (0, cx)). 

The parameter ip £ ^ = TxB2xBixA can be considered a linear 
functional on Tir by defining 'ip{h) = hia + h^r] + h'^f3 + Jq h4^{u)dA{u), 
h G TCr- Viewed this way, ^ is a subset of i°^{Hr) with uniform norm 
llV'll(r) = sup/jg-jY^ \ip{h)\, where i°^{B) is the space of bounded functionals 
on B. Note that TCi is rich enough to extract all components of ip. This 
is easy to see for the Euclidean components; and, for the A component, it 
works by using the elements {h : hi = 0, /12 =0,^3 = 0, /i4(n) = l{u < 
t},te [0,r]}cWi. 

In section 5.1, we derive the score operator; while in section 5.2 we derive 
the information operator and establish its continuous invertibility. 

5.1. The score operator. Using the one-dimensional submodel 

/•{■) 
t ^ ipt = ip + t{hi,h2,hs, h4{u)dA{u)), heHr, 

Jo 
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the score operator takes the form 



d 



KcWW^jrLnii^uC) 



nuui^m, 



t=0 



where f/^^^W ^ Ul^{^){hi) + Ul^{'^){h2) + Ul^{^){h^) + Ul^{i^){hi), 
and 

ui^{i;){h,) ^ l{y>c}|/^"/iidiv(^)-Hf(T)i?^_^(/ii)|, 

f^uWl/^a) ^ r Z'{u)hsdNiu)-Ef\T)Rl^{Z'hs), 






m. 



h^{u)dN{u)-E'^\T)Rl^{hi) 



^ l{y<C}Hj'/i(r) + l{y>c}HS'iW, 

and where, for j = 1,2, 



^S(^) 



G(i/f(yAr))-5 



G(g/(yAr)) ' 
G(F,f(yAr)) 



The dependence in the notation on r will prove useful in later developments. 

5.2. The information operator. To obtain the information operator, we 
can differentiate the expectation of the score operator using the map t — > 
ip + tipi, where V^) V'l ^ ^- The information operator, ag : Woo -^ Wooi where 
'Woo = {h : h £ TCr for some r < cxd}, satisfies 



(9) 



ijjiiaeih)) 



d 



dt ^ 



PC/AV^ + ^OW 



t=o 



for every /i G Woo- Taking the Gateaux derivative in @, we obtain cre{h) 



(10) 

/ -V -f ^f ^.^^ \ / /^i \ 

^31 ^32 33 ^34 

(Tg (Tg iTg (Tg 

> _41 42 43 _44 

\ "e "e ^9 "e 



( -V -f ^f ^f \ f h,\ 

h2 

\h, J 



h2 


= P 


-f 


-r 


^23 


ar 


hs 


-¥ 


-r 


^33 


ar 


\hi j 




Uf 


af 


-43 


af 
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af{h 



o\\K 



;-21 



\h^ 



1 



al\h 



^f{h2 



^3 

arih 

and where 



afih, 
4Hh 



h), where 

^ 1{Y > (} {Sf (r) + Ei^\r)Ht{V A r)} Rl4h^), 

^ 1{Y > O {hS°)(t) + Ei^\r)Ht{V A r)} i2^,^(^^/i2), 

^ 1{Y > C} {Sf (r) + Ef\r)Ht{V A r)} i?^,^(Z'/i3), 

^ 1{Y > C} {h(°)(t) + Ei^\r)Ht{V A r)} i?^,^(/i4), 

^ 1{Y > (} {Ef\T)Rl^{Z2h,) +E^i\T)Rl^{Z2)Rl4h)} 

^ 1{Y > C} {E'^^\T)Rl^{Z2Z',h2) + H«(r)i?^-,^(Z2)i?^,^(Z^/i2)} , 

^ Ef\T)Rl^{Z2Z'hs)+E^i\T)Rl^{Z2)Rl4Z'hs), 

^ Ef\T)Rl^{Z2h,)+E^l\T)Rl^{Z2)Rl4h,), 

^ 1{Y > Q {Ef\T)Rl4Zh,) + E^;\T)Rl^{Z)Rl^{h,)} , 

^ 1{Y > (} {Ef\T)Rl4ZZ!,h2) + E^^\T)Rl4Z)Rl4Z!,h2)} , 
^ Ef\T)Rl^{ZZ'hs) + E^i\T)Rl^{Z)Rl^{Z'hs), 
^ Ef\T)Rl^{Zh,) + E^^\T)Rl^{Z)Rl^{h^), 

(n) ^ l{Y>Q}Y{u)e^-d--^^^''~^[Ef\T)h, + E^^\T)Rl4h,)], 

1{Y > C}yHe'-^("^^'^) {Ef\r)Z',{u)h2 + E^l\T)Rl4Z',h2)} , 
y(^)er,(«;Z,y) j^f (T)Z'(n)/i3 + E^^\t) Rl^Z' hs)} , 



u = 



u = 



e'-'\t) = G{H\VAt))-5 



G{H(^iVAT)) \GiHO(yAT))\ 



Note that ah of the above operators are clearly bounded whenever 9 is 
bounded. 

The following lemma strengthens the above Gateaux derivative to a Frechet 
derivative. We will need this strong differentiability to obtain weak conver- 
gence of our estimators. 

Lemma 4. Under the regularity conditions of section 2 and for any 
C G [a,b] and ipi £ ^, the operator tp i-^ PUJ{ip) is Frechet differentiable 
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CHANGE-POINT TRANSFORMATION MODELS 11 

at ipi, with derivative —ip{(T^^{h)) , where h ranges overTir and is the index 
for PT{'ip){-), ip ranges over the linear span lin^ of '^ , and < r < oo. 

The following lemma gives us the desired continuous invertibility of both 
ae^ and the operator ifj i— > V'(c6»o("))- This last operator will be needed for 
weak convergence of regular parameters. 

Lemma 5. Under the regularity conditions of section 2, the linear 
operator ag^ : TCoo -^ T~ioo is continuously invertible and onto, with inverse 
Ug . Moreover, the linear operator ip i-^ ■ip{a0^^[-)), as a map from and to 
lin'i^, is also continuously invertible and onto, with inverse ip i— > ilj{crg (•)). 

6. The convergence rates of the estimators. To determine the con- 
vergence rates of the estimators, we need to study closely the log-likelihood 
process Ln{9) near its maximizer. In the parametric setting, this process can 
be approximated by its expectation which can be shown to be locally con- 
cave. For the Cox model, as in [22|, this same procedure can be applied to 
the partial likelihood which shares the local concavity features of a paramet- 
ric likelihood. Unfortunately, in our present set-up, studying the expectation 
of Ln{0) will lead to problems since Aq has a density and thus AAo{t) = 
for all t £ [0,r]. Hence Ln{Oo) = — oo, and a new approach is needed. The 
approach we take involves a careful reparameterization of A„. 

From section 4, we know that the maximizer A„(t) = /q I ¥nW{s; On) \ 

xdGn{s), where Gn{t) = P„iV(t) and W{-;-) is as defined in ©. It is 
easy to see that for all n large enough and all sufficiently close to ^O) 
t I— > FnW{t;6) is bounded below and above and in total variation, with 
large probability. Thus, if we use the reparameterization r(-) h^ An (•) = 
Jq ex.p{—T{s)}dGn{s) , and maximize I/„(^, An ) over ^ and F, where F G 

BV, we will achieve the same NPMLE as before. Note that the F component 

~ (r) 
of the maximizer of L(^, An ) is therefore just F„(-) = — logP„VF(-; 6n)- 

Define Fo(-) = -logiPW{-,eo)) and 0n(C,7,r) = iC,^,AP), and note 
that the reparameterized NPMLE ((^„, 7^, F^) is the maximizer of the process 

(C,7,r)^X„(C,7,r) = Ln{Cn,AP)-LniCo,lo,A^n°^) 

G(H^"'^'^^'^^^^(t)) 
-^« + ^»<'> + '°^ c(U«o...,r',(")) + (--^ - '■S.X*^ ^. ^) 

X dN{t) - {G{H'^"^^^^^^\v)) - G(/7''"('^0'^«'^«)(y))) 
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12 M. R. KOSOROK AND R. SONG 

We will argue shortly that Xn is uniformly consistent for the function 

(C,7,r)^x(c,7,r) 



= p 



-m + ro(t) + log ^ ^^,,^^^^1 " + (r^ - r^„)(t; Z, Y) 



dN{t) 



- {G{H^'>^'^'^'^\V)) - G{H^''{V))) \ , 



where0o(C,7,r) = (C,7,<^ 4^^(-) = /o" exp{-r(s)}dGo(s), and Go(t) ^ 
PN{t). It will occasionally be useful to use the shorthand A = (7,r), 
An = (7n>f„) and Aq = (70, Tq). 

Define the modified parameter space 0* = {a,b) x T x B2 x Bi x BV; 
and, for each h = (/ii, /i2 5 ^3, ^4, ^5) G M x Tioo, define the metric P2ih) = 
{\hi\ + |/i2p + ll^alP + ||^4|P + ll^sll^)"*" ) where || • ||oo is the uniform norm. 
Note that \hi\ is deliberately not squared. For each e > and k < 00, define 
Bf = {(C, A) G G* : p2((C, A) - (Co, Aq)) < e, ||r||„ < A;}. Note that for some 
kQ < 00 and any e > 0, (Cn, A„) is eventually in i?*'^" for all n large enough 
by theorem n above combined with lemma IHl below: 

Lemma 6. There exists a k^ < 00 such that limsup„^oQ l|rn||i) < ^0 
and lim„^oo Hfn — Tolloo = outer almost surely. 

Now we study the local behavior of X . First fix C G (a, b). Since, for any 
geBV, 



Mp^)(-) 



dt 



g{s)dAP{s), 




i=0 

we obtain that the first derivative of (7,r) 1—* X{(,^,T) in the direction 

(T) 

h G TLoo, is precisely —PUT{'y,AQ ){h). Moreover, by definition of the score 
and information operators, the second derivative in the same direction is 

-V'r \^( (r)\(^) > where ^r = (/ii, /12, /isJo''' /i4(s)d^o^^(s))- At the 

point (C, 7, F) = (Co, 7o, Fq), the first derivative is 0, while the second deriva- 
tive is < 0, by lemma |SJ By the smoothness of the score and information 
operators ensured by condition Dl and D2, and by the arbitrariness of h, 
we now have that the function (7,F) 1-^ X{(^,j,T) is concave for every 
(C,7,F) G B*'^°, for sufficiently small e. 

NownotethatX(C,7,r) = Pr(C,7,r)-Pr(Co,7o,ro),wherer(C,7,r) = 

(11) - r T{t)dN{t) + lf^^'^\v,6,Z)l{Y < (} + lt^^^'^\v,6,Z)l{Y > 0, 
Jo 
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CHANGE-POINT TRANSFORMATION MODELS 13 

and where /^, j = 1,2, are as defined in section 3, and -0(71 T) = (7, Aq ). 
By condition B2, we now have that for small enough e > 0, C ^^ -^(Ci 7) T) 
is right and left continuously differentiable for all (C,7,r) G B* °, with left 
partial derivative 

x^{^,r)^p{it'-''^\v,d,z)-if'^'^\v,6,z)\Y = c} 

and right partial derivative 

x+(7,r) ^p{lt^^'^\v,5,Z) - lt^^'^\v,5,Z)\ Y = C+}. 

We now have the following lemmas on the local behavior of X with respect 
to C: 

Lemma 7. Under the conditions of section 2, X7(7o,ro) > and 
X+(7o,ro)<0. 

Lemma 8. There exists ei,ki > such that X{(,^,T) < —ki\C — Col 
/ora//(C,7,r)Gi?,t°- 

The two previous lemmas can be combined with the next lemma, lemmalHl 
to yield ^/n rates for all of the parameters (theorem EJ : 

Lemma 9. There exists an €2 > such that Dn = ^/n{Xn — X) 
converges weakly to a tight mean zero Gaussian process Dq, in i°°{B*^°), 
for which Dq{(,^,T) -^ in probability, as p2{{C,^,T) — (Co)70iro)) -^ O.D 

Theorem 2. Under the conditions of section 2, \/n\C,n — Co\ = Op(l), 
V^||'0n - V'olloo = Op{l), and ^/n\\fn - ToUoo = Op{l). 

To refine the rate for Cm we need two more lemmas, lemmas 1101 and ITT] 
below. We will also need to define the process C, 1— > ^ri(C) = 



G'( ffeo(C.7o,ro)( ^)) 
G{H^^{t)) 



log ^ ^,rr,.,... + (nc-ro) - ^Co)(i; Z, Y) 



dN{t) 



Lemma 10. < X„(C„, A„) - X;(C„) < Op{n-^). 

imsart-aos ver. 2006/01/04 file: cl4.tex date: February 2, 2008 



14 M. R. KOSOROK AND R. SONG 

Lemma 11. There exists an 63 > and k2 < 00 such that, for all 
< e < e-s and n > I, E sup|^_^jj|<jZ)n(C)l < ^2\/e, where DniC) = 

VH(x;(C)-x(c,Ao)). 

We now have the following theorem about the convergence rate for ^„: 

Theorem 3. Under the conditions of section 2, n|^„ — ^qI = Op{l). 

Proof. The method of proof involves a "peeling device" (see, for example, 
the proof of theorem 5.1 of |15|, or the proof of theorem 2 of |23]). Fix e > 0. 
By consistency and lemma IHl P{{Cn/^n) £ B*^°) > 1 — e for all n large 
enough, where £4 = ei A £2 A £3. By lemma [Till there exists an M^ < 00 such 
that P{Xn{Cn, An) " X*{Cn) > Mf/n) < e. For integers A: > 1, let m^ = k^. 
We now have, for any integer A; > 1, that limsup„_^o^ -P ("-ICn — Col > "ifc) 

< limsupP (n\Cn - Col > ruk, (Cn, A^) G B*^°, 

M* \ 
X„(Cn,An)-X:(C„)<^j+2e 

(M* \ 
sup ^;^(C) > -\+2e 

C:mfc/n<|C-Co|<€4 " / 

fc.4 / 

(12) < limsup^P sup £>„(C) 

"^00 j^k \C:mj7«<IC-Co|<(mj+i/n)A€4 

\ n ri J J 

by lemma IHl where k^^ = minj/c : m^+i > nei\. But, by lemma ITTl 

JEJ < limsup y , ^ \^^ + 2e < y , ^ ^ „;^ + 2e. 

We can now choose k < 00 large enough so that this last term < 3e. Since 
e > was arbitrary, we now have that limm^oo li™sup„^oQ P(n|Cn — Col > 
m) = 0, and the desired conclusion follows. D 

7. Weak convergence of the estimators. 

7.1. The asymptotic distribution of the change-point estimator. Denote 
Un,M = {u = n{C - Co) : C e [a,b], \u\ < M} and Cn,u = Co + u/n. The 
limiting distribution of n(Cn — Co) will be deduced from the behavior of the 
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restriction of the process u — > n[Ln{ipnXn,u) — -^n(V'n)Co)] to the compact 
set U„,Af; for M sufficiently large. 



Theorem 4. The following approximation holds for all M > 0, as 



n 



oo; 



U^ n[Lnill^n,Cn,u) " -£n(V'n, Co)] = Qn{u) + Op"'^' {!), 

where Op(l) denotes a term going to zero in probability uniformly over the 
set B and Qn{u) = 

nP„ {{l{Cn,u <Y<Co}- l{Co <Y< Cn,u}) [if (V, 6, Z) - /f°(y, 5, Z)] } . 

Let Qn{u) = Q'^{u)'\.{u > 0} — Ql^{u)'\.{u < 0}. We now study the weak 
convergence of Qn as a random variable on the space of cadlag functions 
D with the Skorohod topology, and on its restriction to the space Dm of 
cadlag functions on [— Af, M], for any Af > 0, similar to the approach taken 
in |27|. In order to describe the asymptotic distribution of Qn, let u^ and i/~ 
be two independent jump processes on M such that v~^{s) is a Poisson vari- 
able with parameter s"*"/i(Co) and i'~{s) is a Poisson variable with parameter 
(— s)+/i(Co)- Here, u^ denotes uVO. Let (V'^^)fc>i and {Vf7)k>i be indepen- 
dent sequences of i.i.d. random variables with characteristic functions 



and 



<P+{t)=P 



-{t)=P 



e^*^." 



e^*^." 



t{/fo 



(v,s,z)-q'^(v,s. 



-'} 



P 



ii|/fo(y,5,z)-«^o(y,5,z)| 



y = Co 



Y = Co 



respectively, where {Vi^)k>i and (V^fc~)fc>i are independent of W^ and v^ . 

Let Q{s) = Q+{s)l{s > 0} - Q~{s)l{s < 0} be the right-continuous 
jump process defined by 

0<k<u+{s) 



Q' 



E 



V, 



k ' 



Q<k<u~{s+) 



where Vq' 
obtain: 



V^ =0. Using a modification of the arguments in |23], we 



Theorem 5. Under the regularity conditions of section 2, the pro- 
cess Qn converges weakly to Q in Dm, for every M > 0; n{C,n — Co) = 
argmax^Qn{u) + Op(l) which converges weakly to vq = argmin{\v\ : Q{v) = 
argmaxQ}; andn{(n — Co) o.nd \/nFnUJ^{ipo){h) are asymptotically indepen- 
dent for all h G Tioo ■ 
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16 M. R. KOSOROK AND R. SONG 

7.2. Asymptotic normality of the regular parameters. We use Hoffmann- 
J0rgensen weak convergence as described in ^32]. We have the following 
result: 

Theorem 6. Under the conditions of theorem 1, y/n{il)n — V'o) ^-5 
asymptotically linear, with influence function l{h) = UJ (ipo){o-g (h)), h G 
Til, converging weakly in the uniform norm to a tight, mean zero Gaussian 
process Z with covariance E[l{g)l{h)], for all g,h £ Hi. Thus n{C,n — Co) oi-i^d 
y/n[ipri — ipo) C'l^G asymptotically independent. 

Remark 1. Since ■v/nCV'n— V^o) is asymptotically linear, with influence 
function contained in the closed linear span of the tangent space (since ag^ 
is continuously invertible), ipn is regular and hence as efficient as if Co were 
known, by Theorem 5. 2. 3 and Theorem 5. 2. 1 of Jj/. 

8. Inference when olq :^ or rjo 7^ 0. In this section we develop 
Monte Carlo methods for inference for the parameter estimators when it is 
known that either oq 7^ or r/g 7^ 0, i.e., it is known that condition C2 is 
satisfied. In section 9, we develop a hypothesis testing procedure to assess 
whether Hq : ao = = r/o holds (i.e., that C2 does not hold). When it is 
known that Hq holds, the model reduces to the usual transformation model 
(see (21I)) and thus validity of the bootstrap will follow from arguments 
similar to those used in the proof of corollary 1 of jlTj] . 

8.1. Inference for the change-point. One possibility for inference for C, 
is to use the subsampling bootstrap |2a] which is guaranteed to work, pro- 
vided the subsample sizes in satisfy £„ ^ 00 and in/n — > 0. However, this 
approach is very computationally intense since, for each subsample, the like- 
lihood must be maximized over the entire parameter space. To ameliorate 
the computational strain, we propose as an alternative the following spe- 
cialized parametric bootstrap. Let -F+ and F_ be the distribution functions 
corresponding to the moment generating functions (/>"*" and (j)~ , respectively. 
We need to make the following additional assumption: 

B5: Both F+ and F_ are continuous. 

Now let fhn be the minimum of the number of Y observations in the sample 
> Qn and the number of Y observations < C,n- Now choose sequences of 
possibly data dependent integers 1 < Ci^n < C'2,n < "in such that Ci^n — *■ 00, 
C2,n ~ Ci^n — > oo, and C2,n/"' -^ 0, in probability, as n — > 00. Note that if 
one chooses Ci^„ to be the closest integer to rhn and C2,n to be the closest 
integer to fhr! , the given requirements will be satisfied since m„ — > 00, 
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CHANGE-POINT TRANSFORMATION MODELS 17 

in probability, by assumption Bl. Let ^(i), • • • ,-^(n) be the complete data 
observations corresponding to the order statistics l^i), . . . ,y(„) of the Y 
observations. Also let A;„ = C2,n — C'l.n + 1, and define /„ to be the integer 
satisfying (n = Yn )■ The existence of this integer follows from the form of 
the MLE. 

Now, for J = 1, ... , kn, and any ip £ "if, define 

-k (^(r„+Ci,„+i-i)' '^(L+Ci,„+i-i)' -^an+Ci.^+i-i))' 



^j,1p - ^l(^an-Cl,„-i)''^a„-Cl,„-j)'^a„-Cl,„-j)) 

~h (^([„-Cl,„-i)' '^([„-Cl,„-i)' ^{ln-Cl,n~j)' 



Y^ = Y,r , ^ , . 1^ and Y- = Y,j ^ .-,. Also let F"! be the data- 
dependent distribution function for a random variable drawn with replace- 
ment from \V^-: , . . . , V^ ' I, and let F" be the data-dependent distribu- 

tion function for a random variable drawn with replacement from {V^ ^ ) • • • ) 

V"-" - }. By the smoothness of the terms involved, it is easy to verify that 



both sup^< .<^^ V+j -V+ =op(l)andsupi< 



i,^„ i.V'o 



:ij-i^n 



V- -V~, 



Op(l) 



Moreover, by assumption B2(i), the fact that n{Cn — Co) = Op{l), and the 
conditions on Ci^„ and C2,n, we have that both P{Y{' < Co < ^i^) -^ 1 and 
Y~^ — y-~ = op(l). Thus, by assumption B2(ii), the collection {V^>, . . . , 

V^ } converges in distribution to an i.i.d. sample of random variables 

with characteristic function (/>"'', while the collection {^i~^„, • • • , ^jT } is 

5'rO kn^ipo 

independent of the first collection and converges in distribution to an i.i.d. 
sample of random variables with characteristic function (j)~ . By assump- 
tion B5 and the fact that kn -^ cx), in probability, we now have that both 
sup„eM \F^(.v) - F+{v)\ = op{l) and snp,^^jF!!{v) - F^{v)\ = op(l). 

Now let hn be a consistent estimator of /i(Co)- Such an estimator can be 
obtained from a kernel density estimator of h based on the Y observations 
and evaluated at Cn- The basic idea of our parametric bootstrap is to cre- 
ate a stochastic process Qn defined similarly to the process Q described in 
section 7.1. To this end, let O^ and i>~ be two independent jump processes 
defined on the interval Bn = [— n(Cra— a), n(6— Cn)] such that i^^{s) is Poisson 
with parameter s'^hn and ^'^(s) is Poisson with parameter (— s)'^/i„. Also 
let {V^f^)k>i and {V~f^)k>i be two independent sequences of i.i.d. random 
variables drawn from F" and F" and independent of the Poisson processes. 
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18 M. R. KOSOROK AND R. SONG 

Now construct u i— > Qniu) = On ('")!{'" > 0} — Q^{u)'L{u < 0} on the in- 
terval Bn, where Q+{u) = Eo<fc<i>+{«) K^fc ^nd Q-{u) = Eo<fc<i>-(«+) K^k- 
Finally, we compute v^ = argmin^ ] l^h Qn{v) = argmax^ Qn >■ The fol- 
lowing proposition now follows from the fact that P{K £ B^) — > 1 for all 
compact K C M: 

Proposition 1. The conditional distribution of i)^ given the data is 
asymptotically equal to the distribution of vq defined in theorem [31 

Hence for any vr > 0, we can consistently estimate the tt/2 and 1 — 7r/2 
quantiles of vq based on a large number of independent draws from v^:, 
which estimates we will denote by g^/2 aiid ^i_7r/2) respectively. Thus an 
asymptotically valid 1 — vr confidence interval for (q is [Cn — Qi~tt/2 ) Cn — Qn/2] ■ 

8.2. Inference for regular parameters. Because Cn is n-consistent for (^q, 
(^0 can be treated as known in constructing inference for the regular param- 
eters. Accordingly, we propose bootstrapping the likelihood and maximizing 
over i/j while holding ^ fixed at Cn- This will significantly reduce the com- 
putational demands of the bootstrap. Also, to avoid the occurrence of ties 
during resampling, we suggest the following weighted bootstrap alternative 
to the usual nonparametric bootstrap. First generate n i.i.d. positive random 
variables ki, . . . , «;„, with mean < /u^ < 00, variance < a^ < 00, and 
with /q°° yj P{k,i > u)du < cxD. Divide each weight by the sample average of 
the weights R, to obtain "standardized weights" k;°, . . . , k° which sum to n. 
For a real, measurable function /, define the weighted empirical measure 
P°/ = n~^J27=i '^ifi-^i)- Recall that the nonparametric bootstrap empiri- 
cal measure P*/ = n~^ J27=i i^ifi-^i) uses multinomial weights k*, . . . , k* , 
where E [k'] = 1, i = 1, . . . ,n, and J27=i '^i = "- almost surely. 

The proposed weighted bootstrap estimate ip'^ is obtained by maximizing 
-^n(V')Cn) over ip G ^, where Z/° is obtained by replacing P„ with P° in 
the definition of L„ from section 3. We can similarly defined a modified 
nonparametric bootstrap ^* as the argmax of ^ 1-^ L'^{TpXn)i where L* 
is obtained by replacing P„ with P* in the definition of L„. The following 
corollary establishes the validity of both kinds of bootstraps: 

Corollary 1. Under the conditions of theorem, \^ the conditional 
bootstrap of ipn, based on either V'* or ■ip'^, is asymptotically consistent for 
the limiting distribution TL in the following sense: Both ■y/n('0* — ipn) and 
\/n{jiK/(yK){'^n ~ i^n) are asymptotically measurable, and both 

(i) swpg^BLi ^'9 [V^ii^n ~ i^n)) — Eg{Ij) — > m outer probability and 
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CHANGE-POINT TRANSFORMATION MODELS 19 

(a) snpg^BLi EoQ [yn{ii^,/a,,){'4}l - ipn)j - Eg{Z) ^ in outer proba- 
bility, 

where BLi is the space of functions mapping R'^+'^+-'^ x ^°°[0,r] h^ M which 
are bounded in absolute value by 1 and have Lipschitz norm < 1. Here, E, 
and Eo are expectations that are taken over the multinomial and standardized 
weights, respectively, conditional on the data. 

Remark 2. ^45 discussed in remark 15 of \l 'V . the choice of weights 
Ki, . . . ,Kn in this kind of setting does not effect the first order asymptotics. 
However, it may have an effect on finite samples. In our experience, we have 
found that both exponential and truncated exponential weights perform quite 
well. 

9. Test for the presence of a change-point. Constructing a valid 
test of the null hypothesis that there is no change-point, Hq : oq = = ?70) 
poses an interesting challenge. Since the location of the change-point is no 
longer identifiable under Hq, this is an example of the issue studied in Jj]. 
The test statistic we propose is a functional of the a and r] components of 
the score process, C, ^^ 5'i(C) = \/^^n(^c i^^o)) f^f 2(^0)')') where C, G [a,h], 
ijjQ = (0,0, (5q,Aq), and where (/3o,Ao) is the restricted MLE of (/9o,^o) 
under the assumption that a = and rj = 0. This MLE is relatively easy 
to compute since estimation of C, is not needed. Specifically, we have from 
section 3, that V'o is the maximizer of 

(13) iP ^ Fn{6log{nAA{V)) + lt{V,d,Z)]. 

We also define for future use h 1-^ S2{h) = ^/nFn{UJ ^{tpQ){h3) , f/J4(V'o)(^4))') 
where h E Tii. The statistic we propose using is Tn = sup^gr^ y \ S'i{C)V~^{C) 

xS'i(C) f ) where Vn{C) is a consistent estimator of the covariance of Si{(). 

There are several reasons for us to consider the sup functional of score 
statistics instead of wald or likelihood ratio statistics. Firstly, the score 
statistic is much less computational intense which makes the bootstrap im- 
plementation feasible. Secondly, we choose the sup functional because of its 
guarantee to have some power under local alternatives, as argued in [1^ 
and which we prove below. We note, however, that [4] argue that certain 
weighted averages of score statistics are optimal tests in some settings. A 
careful analysis of the relative merits of the two approaches in our setting is 
beyond the scope of the current paper but is an interesting topic for future 
research. However, as a step in this direction, we will compare Tn with the 
integrated statistic t„ = /[„;,] [s[{OV~HOSi{C)} dQ. 
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In this section, we first discuss a Monte Carlo technique which enables 
computation of Vn{C), so that T„ and T„ can be calculated in the first 
place, as well as computation of critical values for hypothesis testing. We 
then discuss the asymptotic properties of the statistics under a sequence of 
contiguous alternatives so that power can be verified. Specifically, we assume 
that all the conditions of section 2 hold except for C2 which we replace with 

C2': For each n > 1, uq = a^j ^fn and % = "i]*l \f^-, for some fixed a,,, € M 
and ry* G R''. The joint distribution of (C, Z, y) does not change with 
n. 

Note that when a,, 7^ or r/,,, 7^ 0, condition C2' will cause the distribution 
of the failure time T, given the covariates (Z, Y\ to change with n, and the 
value of Co will affect this distribution. 

9.1. Monte Carlo computation and inference. While the nonparamet- 
ric bootstrap may be a reasonable approach, it is unclear how to verify 
its theoretical properties in this context. We will use instead the weighted 
bootstrap, based on the multipliers k°,...,k° defined in section 8.2. Let 
P° be the corresponding weighted empirical measure, and define ipQ to 
be the maximizer of H13() after replacing P„ with P°. Also let SKQ = 
^/nF^{Ul^{^Po), Ul^i^'o)')'- Note that the same sample of weights k^, . . . , k° 
are used for computing both tpQ and the process {S'J(C),C ^ [o;^]}) so that 
the proper dependence between the score statistic and ^0 will be captured. 
The structure of the set-up only requires considering values of C in the set 
{Y(i), • • • , ^(n)} 1^ [oj b], since C i— ^ '5'i(C) does not change over the intervals 
[Yf^j), y(j+i)), 1 < i < n—l. Now repeat the bootstrap procedure a large num- 
ber of times M^, to obtain the bootstrapped score processes SI i, . . . , S° - . 

' r,iViTi 

Note that we are allowing the number of bootstraps to depend on n. Define 

C ^ finiC) = Mn' E&i ^^.(C) and let 

C ^ VniO = M-1 Y. {'5i%(C) - An(C)} {SiAO - UO} . 

k=l 

Now we can compute the test statistics T^ and T„ with this choice for Vn- 
To estimate critical values, we compute the standardized bootstrap test 

statistics f°^,^ = sup^g[„^t] | [si^iO - /i„(C)] V-'^iC) [sikiO - MC)] | and 

Tlk ^ /m { [sikiO - MO]'VnHO [sikiO - An(C)] } dC, for 1 < A; < 
Mn- For a test of size tt, we compare the test statistics with the (1 — 7r)th 
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quantile of the corresponding M„ standardized bootstrap statistics. The rea- 
son we subtract off the sample mean when computing the bootstrapped test 
statistics is to make sure that we are approximating the null distribution 
even when the null hypothesis may not be true. What is a little unusual 
about this procedure is that the bootstrap must be performed before the 
statistics T„ and T„ can be calculated in the first place. We also reiterate 
again that we are assuming the covariates Zi{-) are observed at all time 
points Vj <Vi for which 6j = 1. As noted in section 2, we are aware that 
this is not necessarily valid in practice. As pointed out by a referee this is an 
important issues and it would be worth investigating whether the bootstrap 
weighting scheme could be modified to perform and account for imputation 
of the missing covariate values. Nevertheless, this issue is beyond the scope 
of this paper and we do not pursue it further here. 

9.2. Asymptotic properties. In this section we establish the asymptotic 

validity of the proposed test procedure. Let P denote the fixed probability 

distribution under the null hypothesis Hq, and let Pn be the sequence of 

probability distributions under the contiguous sequence of alternatives Hi 

defined in C2'. Note that P and Pn can be equal if a^, = = ry*. We need to 

study the proposed procedure under general P„ to determine both its size 

under the null and its power under the alternative. We will use the notation 

P 

-^ to denote weak convergence under P„. We need the following lemmas and 

theorem: 

Lemma 12. The sequence of probability measures Pn satisfies 
(14) 
V^(dPy2 _ rfpi/2) _ 1 (c/^; ,(V.*)(a.) + Ul4ro){v*)) dP'/'j' - 0, 

where % = (0, 0,/3o, Aq). 

Lemma 13. llV'o ~ "^olloo ^0 in probability under Pn- 

Theorem 7. Under the conditions of section 2, with condition C2 
replaced by C2\ Si converges under Pn in distribution in /°^([a, 6]"^^^) to 
the {q + l)-vector process ( h^ ^*(C) + '^*iC)j where Z* is a tight, mean 
zero Gaussian {q + 1) -vector process with cou[Z*(Ci), Z*(("2)] = ^*{CiX2) = 
alHCi V C2) - ^l'(Ci)kf ]-'^f (C2), for all Ci,C2 G [a,b], where, for each 
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CG [a,b], 

MO - K(cvco)-a.^^(c)kf]-vf(Co)}(^;), 

rr^l n-22 / \ 0-23 ^24 

/ _31 _32 \ / 33 _34 

' %*,C '^^o'C \ 22 _ f '^°*''^° '^°'^" 

, rr^l 0-42 /' * 1 rr'^^ 0-44 



-f(C) 



and where dg , for 1 < i, A; < 4, is as defined in section 5.2. 

The following is the main result on the limiting distribution of the test 
statistics. For the remainder of this section, we require condition B4 to 
hold. As will be shown in the proof of corollary \^ condition B4 implies 
that 14 (C) = 5],,(C,^) is positive definite for all Q G [a, 6]. Note that we 
will establish consistency of Vn after we verify the validity of the proposed 
bootstrap. 

Corollary 2. Assume B4 holds and V^(C) -^ ^(C) ^'^ probability 

^ P ( I 

under Pn, uniformly over C G [a,b]. Then Tn ~^ sup^gu^ < [Z*((^) + ^'*(C)] 

xK-^C) [MO + MC)]} andfn ^ /[,,,] {[z*{C) + MC)]'v~Hc) IMC) 

+^*(C)]}- Thus the limiting null distributions ofTn andTn are T^, = supAgr^ji 
{K{C)V-HC)MC)} and t, ^ /[^ „ {Z:(C)K-nC)Z*(C)} dC, respectively. ' 

Remark 3. Note that z^*(Co) equals the matrix S*(Co) Co) times [a^,/]'^)' . 
By arguments in the proof of lemma\^ we know that E^(Co,Co) is positive 
definite. Thus ^'*(Co) will be strictly nonzero whenever {a^:,rj'^y ^ 0. Thus 
both Tn and Tn will have power to reject Hq under strictly non-null contigu- 
ous alternatives //f . 

The following theorem is the first step in establishing the validity of the 

P 
bootstrap. For brevity, we will use the notation -w to denote conditional 

convergence of the bootstrap, either weakly in the sense of corollary ^ or in 
probability, but under Pn rather than P. 

^ p 
Theorem 8. Under the conditions of theorem ^ S^ — Si -^ Z^ in 
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The following corollary yields the desired consistency of Vn and the va- 
lidity of the proposed bootstrap for obtaining critical values. Define F(n) = 

M-i Efi"i 1 {f°,k < ^} and F(^) = M"! E&i 1 {^k^ < u}. 

Corollary 3. There exists a sequence Mn -^ oo, as n ^ oo, such 
that y„ -w S=K, Vn~^ S=K, and both sup^gj^ F(u) — P \T^ <u> -^0 and sup^gjg 

F(n)-P{t* <u} 



f 71 „ 

-^ u. 

o 



10. Implementation and simulation study. We have implemented 
the proposed estimation and inference procedures for both the proportional 
hazards and proportional odds models. The maximum likelihood estimates 
were computed using the profile likelihood pLn{C) defined in section 4. A line 
search over the order statistics of Y is used to maximize over ^, while New- 
ton's method is used to maximize over ip. The stationary point equation Q 
can be used to profile over A for each value of C and 7. In our experience, 
the computational time of the entire procedure is reasonable. A thorough 
simulation study to validate the moderate sample size performance of this 
procedure and the proposed bootstrap procedures of section 8 is underway 
and will be presented elsewhere. 

Because of the unusual form of the statistical tests proposed in sec- 
tion 9, we feel it is worthwhile at this point to present a small simulation 
study evaluating their moderate sample size performance. Both the pro- 
portional hazards and proportional odds models were considered. A single 
time-independent covariate with a standard normal distribution was used, 
so that d = q = 1, and the change-point Y also had a standard normal 
distribution. The parameter values were set at Co = 0, oq = 0, /3o = 1, 
r/o G {0, — 0.5, — 1, — 2, — 3}, and Ao(*) = t- The range of % values includes 
the null hypothesis Hq (when 770 = 0) and several alternative hypotheses. 
The censoring time was exponentially distributed with rate 0.1 and trun- 
cated at 10. This resulted in a censoring rate of about 25%. The sample size 
for each simulated data set was 300. For each simulated data set, 250 boot- 
straps were generated with standard exponential weights truncated at 5, to 
compute Vn and the critical values for the two test statistics, T„ (the "sup 
score test") and r„ (the "mean score test"). The range for C, was restricted 
to the inner 80% of the Y values. Each scenario was replicated 250 times. 

The results of the simulation study are presented in table ^ on page El 
The type I error (the r/o = column) is quite close to the targeted 0.05 
level, and the power increases with the magnitude of %• Also, the sup test 
is notably more powerful than the mean test for all alternatives. We also 
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Table 1 

Results from the simulation study of the sup and mean score test statistics in the 

proportional hazards and proportional odds models. The sample size is 300, the level of 

censoring approximately 25%, and the nominal type I error is 0.05. 250 replicates were 

generated for each configuration. The parameters were set at C,^ = 0, ao = 0, /3o = 1; and 

Aoit) = t, with the value of rjo varying. The worst-case Monte Carlo standard error for 

the power estimates is 0.03 = 0.50/\/250. 



Proportional hazards model 


Sup score test statistic 


Null riQ = 


710 = -0.5 


rio = -1 


710 = -2 


770 = -3 


mean 


5.078 


5.590 


7.874 


13.524 


35.507 


Standard Deviation 


2.728 


2.859 


3.919 


6.992 


11.337 


power 


0.044 


0.076 


0.180 


0.536 


0.980 


Mean score test statistic 


Null r]o = 


Vo = -0.5 


rio = -1 


VO = -2 


VO = -3 


mean 


1.403 


1.694 


2.560 


5.412 


5.529 


Standard Deviation 


1.206 


1.104 


1.597 


2.492 


2.683 


power 


0.040 


0.050 


0.120 


0.236 


0.304 


Proportional odds model 


Sup score test statistic 


Null ?7o = 


rio = -0.5 


Vo = -1 


VO = -2 


VO = -3 


mean 


3.950 


4.762 


5.693 


8.327 


13.956 


Standard Deviation 


2.390 


1.610 


1.255 


2.901 


4.244 


power 


0.043 


0.068 


0.112 


0.364 


0.660 


Mean score test statistic 


Null 770 = 


710 = -0.5 


Vo = -1 


VO = -2 


VO = -3 


mean 


1.177 


1.912 


2.848 


3.265 


4.349 


Standard Deviation 


0.946 


1.078 


1.360 


1.498 


1.718 


power 


0.048 


0.056 


0.116 


0.167 


0.285 



tried the nonparametric bootstrap and found that it did not work nearly as 
well. While it is difficult to make sweeping generalizations with this small 
of a numerical study, it appears as if the proposed test statistics match the 
theoretical predictions and have reasonable power. More simulation studies 
into the properties of these statistics would be worthwhile, especially studies 
of the impact of time-dependent covariates. 

11. Proofs. Proof of lemma^ Verification of Dl is straightforward. 
For D2, we have for all n > 0, 



A(^) 



AH 



E 



W^e 



-uW 



^\We 



-uW} 



< — -, — f < 00. 

- E[VF] 



The second-to-last inequality requires some justification. Note that the prob- 
ability measure Qf{W) = E [/(T^)PF] /E \W] is well-defined for functions / 
bounded by OiW'^) by the positivity of W and the existence of a fourth 
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moment. Now we have 



E 



W^e 



-uW 



E[VFe- 



-uw^ 



^[^^""^] < Q[W] = Jgl 



since e~ uniformly down-weights larger values of W and thus forces the 
left term of the inequality to be decreasing in u. This proves the first part. 
For the second part, take cq = c, and note that 



\u^K{u)\ =E 
where k = sup, 
\u^+^k{u)\ =E 



u e 



x>0 ^ 6 



-uW 



E 



W-\uWYe 



-uW 



< kE [W- 



c'^e '^ < oo. Similarly, 



u 



l+c 



We 



-uW 



where k' = sup2.>o x^+^e ^ 



= E 
(l + c) 



W-^iuW) 



l+c -uW 



< k'E [W 



i+Cg 1 c < cxD. This concludes the 



proof. D 

Proof of lemma\^ Suppose that 

(15) G(tY{s)e''i^''-^^^^'^dA{u)\ = G (" /'V(s)e"«o("'^'^)d^o(«) 

for all t G [0, t] almost surely under P. The target is to show that ()15p 
implies that ^ = ^o fmd A = Aq on [0, r]. By condition Al, (|15)) implies 

Jo Jo 

for all t E [0, r] almost surely. Taking the Radon-Nikodym derivative of both 
sides with respect to ^40, and taking logarithms, we obtain 

(16) p'Z{t) + {a + r,'Z2{t))l{Y > (} - P'oZ{t) 

- (qo + VoZ2{t))l{Y > Co} + log(a(i)) = 0, 

almost surely, where a = dA/dAQ. 

Assume that C > Co- Now choose y < Co such that y £ V{Co) and 
var[Z(ii)|y = y] is positive definite, where ti is as defined in B3. Note 
that this is possible by assumptions B2 and B3. Conditioning the left-hand 
side of H16() on y = y and evaluating at t = ti yields that P = Pq. Now 
choose Co < y < C such that y £ V^(Co) and var:[Z{t2)\Y = y] is positive 
definite. Conditioning the left-hand side of ()16() on y = y, and evaluating 
at t = ^2 yields that tjq = 0. Because the density of Y is positive in V{Co), 
we also see that ao = 0. But this is not possible by condition C2. A similar 
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argument can be used to show that (^ < (^o is impossible. Thus C = Co- Now 
it is not hard to argue that condition B3 forces (3 = Po, rj = r]o and a = uq. 
Hence log(a(t)) = for all t G [0, r], and the proof is complete. D 

Proof of lemma Note that for each n, maximizing the log-likelihood 
over A is equivalent to maximizing over a fixed number of parameters since 
the number of jumps K < n. Thus maximizing over the whole parameter 6 
involves maximizing an empirical average of functions that are smooth over 
tp and cadlag over (^. Note also that 

K 

||i„ - ^o||[o,r] = E {\MTj-) - Ao{T,)\ V |i„(r,) - AoiTj)\) , 

where || • \\b is the uniform norm over the set B, and thus \\An — Ao||[o,r] is 
measurable. Hence the uniform distance between 9n and ^o is also measur- 
able. Thus almost sure convergence of 9n is equivalent to outer almost sure 
convergence. Now we return to the proof. Assume 

(17) lim sup A„(t) = oo, 

n—rca 

with probability > 0. We will show that this leads to a contradiction. It 
is now possible to choose a data sequence such that (fT7|) holds and G„ = 
P„A^ -^ Gq = PqN uniformly, since the latter happens with probability 1. 
Fix one such sequence {n}, and define 9n = {S,o,An), where An = Gn- Note 
that the log-likelihood difference, Ln{On) — Ln{6n), should be non-negative 
for all n, since 9n maximizes the log-likelihood. We are going to show that 
the difference is asymptotically negative under the assumption ()17() . 

Now choose a subsequence {n^} such that An,,{T) — > oo, as /c ^ oo. We 
now have, for cq > from assumption D2, that Ln^{9n^) — Ln^{9n^) 

< 0(1) +Fn,6[log{nkAAn,{V)) +log[-A{H^-{V)))' 

-Fn,{l-6)G{H^-{V)) 

(18) < 0{l)+Fn,6log{nkAAn,{V))-Fn,{5 + co)logAn{V), 

since, for all n > 0, log G'(n) = log[— A(n)]— log[A(n)]; log[— A(u)] = log[— n-^+'^° 
A{u)] — (1 + co)log(M) < 0(1) — (1 + Co) log(u) by condition D2; and since 
logA(n) = log[ii'^°A(u)] — colog(n) < 0(1) — colog(n) also by condition D2. 
Next we take a partition of [0, r], = uq < fi < • • • < vm = t, for some 
finite M. The right hand side of H18() is now dominated by 

(19p(l) +logi„,(r)P„, {6l{V G bA/-i,oo]} - {6 + co)l{V G [t,oo}) 

M-l 
+ J2 ^OgAn,{v„^)Fn, {Sl{V G K-l,^m]} - {6 + Co)l{V G bm,^^m+l]}) ■ 
m=l 
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For a fixed constant c > 1, we can choose this partition such that 

PoN{t)1{V E Kf_i,oo]} = Po[N{t)+co/c]1{V G [r,oo]}, 

and, for m = 1, ..., M — 1, 

PoN{r)l{V £ [v^-i,v^]} = Po[N{t)+co/c]1{V G [v^,v^+i]}. 

RecalHng that Gn — > Go uniformly, we obtain that ()19|) tends to — oo as 
k -^ oo, which is the intended contradiction. Thus, limsup^^o^ ^„(t) < oo 
almost surely. D 

Proof of theoremUl By the opening arguments in the proof of lemmalHl we 
have that outer almost sure convergence is equivalent to the usual almost 
sure convergence in this instance. Note that {yl„(r)} is bounded almost 
surely, Gn -^ Go almost surely, and the class 

where A(^k) = {A £ A : A{t) < A;}, is Donsker (and hence also Glivenko- 
Cantelli) for every /c < oo by lemma [HI b elow . By similar arguments to those 
used in lemma ITU we have that the class {G{H^{V)) : ^ £ X,A £ -^(fc)} 
is also Glivenko-Cantelli for all k < oo. We therefore have the following 
with probability 1: {A„(t)} is bounded asymptotically, Gn -^ Gq uniformly, 
{Fn-P)W{-,9n) ^ uniformly, and (P„-P) \g{H^-^{V)) - G(i7^"(y))] -^ 
0. Now, fix a sequence {n} for which these last four asymptotic events hold. 
We can now use the Helly selection theorem to find a subsequence {n^} 
and a function A such that An^{t) — > A{t) for all t G [0, r] at which A is 
continuous. From 0, we obtain 

\An,{s) - AnM < 0{l)¥njN{s) - N{t)\ ^ 0(l)|Go(s) - Go{t)\, 

for all s,t G [0, r]. Since Gq is continuous by condition C3, we know that A 
must be continuous on all of [0,r]. Thus A„j, — > A uniformly. Without loss 
of generality, we can also assume that along this subsequence ^^^ — *■ C ^or 
some ^ G X = T X Bi X B2 X {a,b). Denote 9 = {£,, A). 
Consider now 9n = {S,o,An), where 

An{t) - ' ^^"(^^ 



PW{u;9o)' 



We can use the same technique as in the derivation of Q to show that ^0 
satisfies 



Jo 



dGoiu) 



PW{u;9oy 
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for all t G [0,t]. Thus An,, — > Ao uniformly, as A; — > oo. At this point, we 
have 



< Lni^{9nJ - Lni^iOnk 

PW{u;9o 



log 



dAo[u 

J ^ dP 



_Fn,W{u;9n,) 
dA{u 



dGn,{u) - P„, [G{H'-^ {V)) - G{H'-^ {V)) 
dGo{u) - P \g{H^{V)) - G{H^\V)) 



< 0. 

But this forces 9 = O^hy the identifiability of the model as given in lemma|2 
Thus all convergent subsequences of ^„, on a set of probability 1, converge 
to ^0- The desired result now follows. D 

Lemma 14. Vfc < oo, the class J^(fc) = {W{t;9) : t G [0, r],^ G X, 
A G A(^k) \, is P-Donsker. 

Proof. Routine arguments can be used to establish that the class J-i = 
{e''c(*;^''*^) : t G [0, r],^ G X} is Donsker. Consider the map 

h G D[0,r] ^ \^j^h{s)dA{s) : t G [0,r],^ G A(k)^ G £°°([0,r] x ^(^j), 

and note that it is uniformly equicontinuous and linear. Thus the class 

•^2 = |^*e'-«(^'^'^)(iyl(s) : t G [0,r],e G A-,^ G ^(^jj 

is Donsker by the continuous mapping theorem. Now condition Dl ensures 
that both G and G/G are Lipschitz on compacts. This fact, combined with 
the facts that sums of Donsker classes are Donsker and products of bounded 
Donsker classes are Donsker, yields the desired results. D 

Proof of lemma ^ By the smoothness assumed in Dl of the involved 
derivatives, we have for each C, G [a, b] and tp* G ^, 



lim sup sup 

*-''° h*Glin<^:pi(fe*)<i'ieWr 



h* {a^*^sth*{h) - (T^*{h)) ds 



0. 



Thus, sup;,e^^ PUlir + h*){h) - PU^{r){h) + h* {a^.{h)) = o{pi{h*)), 
as pi{h*) ^O.D 
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Proof of lemmal^ First note that for any h = {hi,h2,h3,h4^) £ 7i 
aeoih) = A(/i)+B(/i), where A(/i) = (/ii, /i2, /i3,9o/i4), B(/i) = ae„{h)-A{h), 



OO) 



and go{u) = P [y(n)e"«o(«;^'^)Hj°^ 



. It is not hard to verify that since go 



is bounded below, A is one-to-one and onto with continuous inverse defined 
by A~^(/i) = [hi, /i2, /13, h^/go). It is also not hard to verify that the operator 
B is compact as an operator on Tlr for any < r < 00. Thus the first part 
of the theorem is proved by lemma 25.93 of |3l|, if we can show that ag^ is 
one-to-one. This will then imply that for each r > 0, there is an s > with 
ag CHs) C 7ir- Now we have 

.^f IIV'K(-))II(.) -, .^^ ^^Pkea-;im)\^(^Ooih))\ _ ,^^ ||V,||(,) 



V'slin* \\V\\{r) i/^elin* WWir) t/ielin-^ W\\(r) 

> s/{4:r), since ||V'||(r) ^ 4(r/s)||^||(s). Thus ip 1-^ V('^o(")) is continuously 
invertible on its range by proposition A. 1.7 of 4]. That it is also onto with 
inverse ip 1-^ ^(c^ ) follows from ag^ being onto. All that remains is verifying 
that aog is one-to-one. 

Let h G Tioo such that o-0^{h) = 0. For the one-dimensional submodel 
defined by the map s -^ Vos = ^0 + 5(^1, ^2, ^3; /o h4{u)dAo{u)), we have 

(20) p{^^L,{^Qs,Co)\s=o}^ = P{U^Mo)W}^ = 0- 

Define the random set S{n, y, t) = {{N, Y) : N{u) = n{u),Y{u) = y{u),u G 
[t,r]}. The equality ^ implies that P{Ul^{ilJo){h)\S{n,y,t)Y = for all S 
such that P{5(n, y, t)} > 0, which implies that UA^{ipo){h) = almost surely 
for all t G [0,t]. Consider the set on which the observation (X,6,Z,Y) is 
censored at a time t G [0,r]. From (|20j) and the preceding argument, 

(21) Rl^^^^{h^l{Y > Co) + h'^Z2{t)l{Y > Co) + h'^Z{t) + h^) = 0. 

Taking the Radon-Nikodym derivative of H21|) with respect to ^0 and divid- 
ing throughout by e^^o(*'^'^) yields 

(22) Y{t){hil{Y > Co) + h'^Z2{t)liY > Co) + h'^Zit) + h^it)) = 0. 

Arguments quite similar to those used in the proof of lemma |21 can now be 
used to verify that (|^^ forces h = 0. Hence (JQ^{h) = implies /i = 0, and 
thus a Of;, is one-to-one. D 

Proof oflemma\^ For the first part, note that t >-^ Y{t) has total variation 
bounded by 1; and, by the model assumptions, the total variation of t 1— > 
Qr^(t;Z,Y) [g bounded by a universal constant that doesn't depend on 6. Thus 
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there exists a universal constant k^, such that ||PnW^('; ^n)||»; < ^*IPn|Si I- 
By the smoothness of the functions involved, and the fact that u >-^ log(n) 
is Lipschitz on compacts bounded above zero, we obtain the first result 
of the lemma. The consistency part follows from lemma El combined with 
theorem^ the continuity of i-^ PW{-; 9), and reapplication of the Lipschitz 
continuity of n ^^ log(n).n 

Proof of lemma[l\ The right-hand derivative of P{Li{'ip,Q) with respect 

to C at C = Co is: (a+/(aC))P(Li(^, C))lc=Co 

P[lt{V,6,Z)\Y = y+] - P[lt{V,6,Z)\Y = y+]} d^,{y)Hy)dy 

[P[ltiV,5,Z)\Y = Co+] - P[ltiV,6,Z)\Y = Co+i) /^(Co), 



where the superscript + denotes differentiating from the right and ^(^^{y) is 
the Dirac delta function assigning counting measure 1 to the event {y = Co}- 
Now, P[lt{V,6,Z)\Y = Co+] - P[lt{V,5,Z)\Y = Co+] 

li{v, d, z) — u^iv, d, z) £2{v, d, z)iQ{v, d, z)dfi{v, d, z) 

= R'^{il)), where ij{v,d,z) = ex.p{lj{v,d,z)}, for j = 1,2; fi{v,d,z) is the 
dominating measure; and ^q (f,(i, z) consists of the remaining components 
of the conditional distribution of {V, 6, Z) given Y = Co+- Note that under 
the model assumptions, £q does not depend on the parameters. Thus 



R+iijo 



1^0 I 



jlpOl 



l^°{v,d, z) — ^2 (^) d, z) i2{v, d, z)£q{v, d, z)dfj,{v, d, z) 



log 



t2it 



K2«0 



d/x < log / 



hit 



log / (.i{v,d^z)iQ{v,d,z)d^{v,d,z) 



0, 

since the integral of a density is 1. Thus X/ {'-fQ,ro) < 0. 

A similar argument is used for the left-hand derivative. In this case, the 
true density of {V, 6, Z) given F = Co is ^i{v, d, z)£q {v, d, z), where £q does 
not involve the parameters. We now have 

P[ltiV,5,Z)\Y = Co] - P[lt{V,6,Z)\Y = Co] 

li{v, d, z) — l2{v,d, z) £2{v,d, z)£^ {v,d, z)dfi{v,d, z) 



log 



Ml 



£i£q dfi > — log 



log / £2{v, d, z)£q {v, d, z)dfi{v, d, z) 



Ml 

£i£^ 
= 0, 



Vo 



dfj, 



imsart-aos ver. 2006/01/04 file: cl4.tex date: February 2, 2008 



CHANGE-POINT TRANSFORMATION MODELS 



31 



and thus we conclude that X^^ (7o,ro) > O.D 

Proof of le'm'ma\^ This follows from lemma 13 the local concavity of X, 
and the smoothness of the derivatives involved. D 

Proof of lemma\^ Note that X„(^, rj^ T) 

{T{t) - ro(t)} dN{t) + W{C, ri, aP) - W{Co, Vo, A^^] 
where W{Cn,A) = lt{V,6,Z)l{Y < C} + lt{V,5,Z)l{Y > (}■ The classes 
{r(t) - To{t)} dN{t) : ||r - Tolloo < e, ||r||„ < ko 



for any e > 0, and < W{C, A) : {(, A) G B*^*^ >, for some £2 > 0, can be shown 
to be Donsker. That this holds for the second class follows from arguments 
similar to those used in the proof of lemma [TH For the first class, note that 
Jq T{t)dN(t) = 6T{V). Since ||r||^ < ko, T can be written as the differ- 
ence between two monotone increasin g fu nctions, each with total variation 
bounded by kg. By theorem 2.7.5 of p^, the class of all monotone func- 
tions with a given compact range is universally Donsker. Since sums of 
Donsker classes are Donsker, we have that the class {r(l/) : ||r||^ < ko} 
is Donsker. That the first class is Donsker now follows since products of 
bounded Donsker classes are Donsker. Since we also have that -y/n(G„ — Gq) 
converges to a Gaussian process, we have that 



^/^(Pn - P) 



{rit)-ro{t)}dN{t) + w{c,7i,AP)-w{Co,m,A'^'>^) 



converges weakly in £°°{B*^'^) to the tight Gaussian process 



G 



{Tit) -Toit)} dN{t) + W{C,v,A}; 



(^)^ 



t^(Co,r?o,<"^) 



where G is the Brownian bridge measure. 

By the smoothness of the functions and derivatives involved, we also have 



V^ {P [- lo {r(i) - To (t)} dN{t) + W{C, V, A. 



^(C,^,r)} = 
+w{Co,m,Ao] 



^P VF(C,r/,A 



(r)^ 



W^(C,r/,A 



w{Co:m,Ap^ 



(^)^ 
I 



dGnit) - dGoit) 



-V^/o^{p[Ty(t;eo(C,A))]e-rW-P[Ty(t;0o)]e-r«W} 
+€„{(, A) = - /([ C(t; C, X)dZn(t)+en{C, A), where ||e„||oo 

= op(l). The fact that the class of functions {C{-;C,X) : (C, A) G B*^o^ has 
uniformly bounded total variation yields asymptotic linearity and normality 
of {£ C{t; C, X)dZn{t) : (C, A) G 5,f »}, and the desired result follows.D 
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Proof of theorem \^ By lemma 121 

-X(Cn,7n,f„) = (X„-X)(C„,7„,f„)-X„(C„,7n,f„) <Op(n-i/2). 

Combining this with lemma |HJ we obtain -^/nlCn — Col 

= ^/^ICn - Co|l{(Cn,7n,f„) G S^fo} + V^\Cn " Co| l{(Cn, 7n, f „) i?,^'} 

< -^A;f^X(C„,7„,f„) + op(l) 

< Op(l). 

Thus the first part of the lemma is proved. 

For the second part, denote U^Atp) = PUJ{ip). By arguments similar to 
those used in the proof of lemma 1141 we can verify that for some ei > 0, 
J^ = {UJ{ip){h) : \\9 — 9o\\oo < ei,h £ Tli} is Donsker. Moreover, the 
continuity of the functions involved also yields that, as ||^ — ^olloo -^ 0, 

sup,g^^ P {u^Wih) - U^^ii^omf - 0. Thus 

(23) V^(f/„^^JV^n)-C/o^^jV'n)-C/;co(^o) + f/oCo(V'o)) = o?^(l). 

Note also that ^|Cn-Co| = Op{l) implies that ^ (^oc„(^") ~ ^dco^'^")) = 
0^1(1). Thus, since C/^. (V'n) =0, ^ implies V^UL(tpn) = 

V^c/^^jV-n) + o^Hi) = -V^ (c^;co(V'o) - u^coi^o)) + o?^(i) = o^Hi), 



where Op{l) denotes a term bounded in probability uniformly over the set 
B. By lemma [SI we know that there exists a constant 62 > such that 

rdco W - U5co(^o)\\h, > e2U - V'olloo + o{U - V'olloo), 

as \\ip - V'olloo -^ 0. Hence ^/n\\'^jJn - ^"01100(62 - op(l)) < Op{l), and we 
obtain the second conclusion of the lemma. 
For the third part, we have 

V^ sup Fr,W{t;9n)-PWit;e^) =^ sup |(P„ - P)Ty(t; ^o) I + op(l) 



tG[0,r] 



iG[0,r] 



= Op(l) and ^suptgjo,^] \PW{t;en) - PW{t;9o)\ = Op(l) by the first two 
parts of this lemma. Hence \/rasuptg[o^T-] ^nW{t; On) — PW{t; Oq) = Op(l). 
The result now follows by the Lipschitz continuity of log{u) over strictly 
positive compact intervals. D 
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Proof of lemma UIX The first inequality follows from the definitions. For 
the second inequality, we use a Taylor's expansion around (C„,7„,r„) to 

obtain XniCn, An) - Xn{(n, Aq) = 






ICn,7n,t,^n I 



(f„ ,)\ (Ao - A„), 



for some t G [0, 1], where A„,t = (7n,i,fn,t); 7n,t = t'jn+{l-t)jo;tn,t = ttn+ 
(1 - t)ro; and, for any h G Hoc, tpnj = f ^i; ^2, /^a, Jo h4{s)dAn "''\s)j. 
The score term is zero by definition of the NPMLE, and the second term has 
absolute value bounded by -fCnllAn — Ao||^, where Kn is bounded in proba- 
bility by the uniform consistency of A„ and by the form of the information 
terms listed in section 5.2. 

Now, letting ipn{7,'^) = (7>^n )> we have X„(Cn, Aq) - X*{Cn) 



(24) 



\{{l{Y<Cn}-MY<Co}) 

]Mio,ro)^y^ 5, Z) - lt"^^°'^'\v, 6, Z) - /f (y, 5, Z) + lt^{V, 5, Z)] } 



= /([p„{(l{y < u - 1{Y < Co}) y(s)^n(5)}e-roW 
where 



dGnis) - dGois) 



Kn{s) 



T'4'n,t 



G{Hl"'\V))-5 



G{Hf"'\V)) 



1pn,t I 



G{H^"-\V)) 



G{Hp\V))-6 



G{Ht"-\V)) 
G{Ht"'\V)) 



^l3'QZ{s)+ao+v'oZ2{s) 



and TJ)n,t = (7, Jo To M 



tdGn{u) + (1 - t)dGoiu) j, for some t G [0, 1], by 
the mean value theorem. By the conditions given in section 2, we have that 
there is a constant k* < 00 such that ||-ftrri(s)ro(s)IU ^ k* with probability 1 
for all n > 1. Thus the absolute value of (|2l|) is bounded above by A:*||Gn — 
Golloo X P„ 1{Y < Cn} - My < Co} = Op{n"^). This last statement fol- 
lows because ||G„-Go||oo = Op{n-^/^), (P„-P) |l{y < Cn} - 1{Y < Co} = 

op{n~^/^), and P 1{Y < C„} - 1{Y < Co} = Op{n-^/^) by theorem |2l 
Now the desired result follows. D 
Proof of lemma \11\ Note first that 



^n(c) = \/^(Fn - p) { [My < c} - My < Co}] 



n '2 



(y,5,z)} 
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Denote H = [l^ — l2\{y,5,Z), and note that \H\ < c* almost surely for a 
fixed constant c, < oo. Thus F^ = ^.{(q — e < Y < Co + e}c* serves as an 
envelope for the class of functions 

^, ^ {[1{Y <(}- 1{Y < Co}] i^ : IC - Col < e}, 

for each e > 0. Note that by the assumptions on the density /i in a neigh- 
borhood of Co ! we have for some 63 > that there exists < A;^, , /c^,* < 00 
such that k^e < p{e) = P[Co — e<^<Co + e] < ^**e for all < e < e^. 
Thus the bracketing entropy 



iVn(u||F,||p,2,^„L2(P))<0 



v?p{e) 



<o 



1 



C=kU^ 



for all n > and < e < £3; and thus, by theorem 2.14.2 of |3j|, there exists 
a c=K=K < cxD such that 



E 



sup 1^(01 

Llc-Coi<€ 



_^ C^ :^t I I X^ £ I I P 2 _1 C:^ ^ C^ Y A/:^ :^t C j 



for all < e < £3. The result now follows for k2 = 0^*0^:^/1^.111 
Proof of theorem ^ We can deduce from section 3 that 

Lni^Pn, Cn,u) " Ln{'4'n, Co) 

= P„ I (1{C„,„ < r < Co} - 1{C0 < ^ < Cn,u]) 

= rr^Qn{u) + En{u), where 



{V,5,Z) 



En{u) = ¥n {1{Y < Co} - 1{Y < Cn,u}) 



'2 



Ip _ /f " + ifo 



{V,5,Z) 



By arguments similar to those used in the proof of lemma [TUl we can obtain 
constants < Fi,F2 <oo such that lf"{V,5,Z) - lf%V,5,Z) < FjUV'n - 
tpoWoo almost surely, for j = 1,2. Hence 

\K{n)\ < P„ |l{y < Co} - My < Cn,u}\ Op{n-^/^). 

By arguments given in the proof of lemma 1111 we know that 

(P„ - P)\1{Y < Co} - 1{Y < Cn,u}\ = dp'^'in-^). 

Since also sup„gu„,M -f' l^i^ ^ Co} - 1{>" < Cn,u}\ = 0{n~^) by condition 
B2(i), we now have that En = O^''^' (n^^' ^). The desired result now follows. D 
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Proof of theorem\^ Fix h G Hoc- We first establish that {Q^,Z^{h) = 
y/nFnUT{ilJo){h)] converges weakly to {Q^,Z{h)), on Dm x ^, where Q^ 
and Z{h) are independent, for each fixed M < oo, and Z{h) is mean zero 
Gaussian with variance a^ = vav[UJ (ipo){h)]. Accordingly, fix M, and let 
= uq < ui < U2 < ■ ■ ■ < uj < M he a finite collection of points and 
qi, ■ ■ ■ ,qj,q be arbitrary real numbers. Our plan is to first show that the 
characteristic function of {Q^{ui), . . . ,Q^{uj),Z'^{h)) converges to that 
of {Q~^{ui), . . . ,Q'^{uj)) times that of Z[h). Since the choice of points 
ui, . . . ,uj is arbitrary, this will imply convergence of all finite-dimensional 
distributions. We will then show that Q^ is asymptotically tight, and this 
will imply the desired weak convergence. 

Let y ^ Injiy) = l{Co + Uj^i/n < y < Co + Uj/n}, j = 1, . . . , J; and 
F^ = [if" - lt]{Vi,6i,Zi) and Zi = U^^{^Po){h){X,), i = l,...,n. In other 
words, Zi is the score contribution from the ith observation. Thus 



(25) Pexp 




Qt{uj.,)]+qZ^ih) 



= HP 

k=l 

However, using the facts that e 

of the Wj^s differs from zero and e"" 



®^P { Yl ^<lj^njiYk)Fk > e 



iqZk/V^ 



E., 



1 = J2j{^^^ ~ 1) when only one 
1 = u{e'" — 1) when u is dichoto- 

mous, we have exp {e/=i «'Zj/ni(n)Ffc} = 1 + E/=i (e^*^"^(^'=)^'= - l) = 
1 + Y.j=iInjiYk) ('e**'^'= - iV Combining this with condition B2 and the 
boundedness of Fk and Zk, we obtain P 



J 



n 



exp {e/=i iqjInj{Yk)Fk] e^^^^l^' 



P 



Y = Co+ 



+o{n ^) 



l + n" 



■^ + MCo)EK- 



Ui 



_i){0+(g,)-l} 



+ oin~^), 



where o(l) denotes a quantity going to zero uniformly over k = 1, . 
Thus the right-hand side of H25() is 



, n. 



exp 



-f-l 



+ HCo) E(^i - ^i"i){</'^(?i) - 1} 



i=i 
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which is precisely 



Pexp 



iqZ{h) +iY.qj {Q+{uj) - Q+{u,,)} 



Thus the finite dimensional distributions converge as desired. 

We next need to verify that Q^ is asymptotically tight on [0,M]. Since 
there exists a constant c^-. < oo such that maxi<j<„ \Fi\ < c* < oo almost 
surely, we have that \Qn{u2) — Qn (""i)! ^ c^nP^ljCo + ui/n < ^ < Co + 
U2/n}, for all < ui < U2 < M. Thus we are done if we can show that 
u ^ Rniu) = nP„l{Co <Y <Cq + u/n} is tight on [0,M]. To this end, fix 
< ui < U2 < M. Now, the expectation of \Rn{u2) — Rn{ui)\ is nP{(Q + 
ui/n < ^ < Co + U2/n} — > \u2 — ui\h{Co)i as n — > oo. This implies the 
desired tightness since u i— > Rn{u) is monotone. We have now established 
that {Qn-, ^^{h)) converges weakly to (Q^, 2{h)), on DmX^, where Q^ and 
2{h) are independent, for each fixed M < oo. Similar arguments also yield 
the weak convergence of {Q~,Z^{h)) to {Q~,Z{h)), on Dm x M, where 
Q~ and Z{h) are again independent, for each fixed M < oo. Thus also 
{Qn-,Z'^{h)) converges weakly to {Q,Z{h)), on Dm x M, where Q and Z{h) 
are independent, for each fixed M < oo. Since n(Cn — Co) = Op(l), the 
argmax continuous mapping theorem (theorem 3.2.2 of [32l]) now yields that 
[iT-iCn — Co)) -2^"(^) ) converges weakly to (argmax Q, Z{h)), with the desired 
asymptotic independence. The remaining results follow. D 

Proof of theorem We have 

= V^FnUJ ii^n) 

= V^FnUl (V^„) + V^{Fn - P) {Ul (V^n) " ^^'o (V^n)) 

+ V^P{Ul{i,n)-U^,{i'n)) 
= V^FnU^^ii^n) + Bi^n + B2,n, 

where the index set for the score terms is Wi. By arguments similar to 
those used in the proof of theorem |2 combined with the fact that n(Cn — 
Co) = Op(l), we have that both Bi^n = Op'(l) and B2,n = Op^{l). Thus 
VnFnU(;o{i>n) = Op^l)- We also have that 

Combining this with lemmaEl the Z-estimator master theorem (theorem 3.3.1 
of [331) now yields the desired results. D 
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Proof of corollaryU\ We first derive the unconditional limiting distribution 
of ^/n{'^p!^ — -00)- If a class of measurable functions J^ is P-Glivenko-Cantelli 
with ||-P||jf < oo, then the class k ■ T = {nf : f £ J^}, where k denotes 
a generic version of one of the weights kj, is also P-Glivenko-Cantelli, by 



theorem 3 of [33] • Thus we can apply the results of theorem ^ with only 
minor modification, combined with the simple fact that R ^ fx^ almost 
surely, to yield that 0° — > ^po outer almost surely. Note that the proof is 
made somewhat easier than before since we already know C,n -^ Co almost 
surely. Furthermore, if a class of measurable functions J-' is P-Donsker with 
||P||jr < oo, then the multiplier central limit theorem (theorem 2.9.2 of 32]) 
yields that the class n- J^ is also P-Donsker. Hence we can apply the results 
of theorem 121 with only minor modification, to yield that ^/n{tp'^ — ipo) is 
asymptotically linear with influence function l°{h) = {k/ iii^)UJ (a^ (/i)), 
h G Til. The factor ^^^ occurs because the information operator for the 
weighted version of the likelihood is (^^(70^. We now have that y/ni^'^ — ipn) = 
^¥n{K/^^n - l)Ul^{ag^{-))+o^\l), unconditionally. 

Finally, the conditional multiplier central limit theorem (theorem 2.9.6 
of |23]) yields part (ii) of the theorem. The factor (^^/ck) arises because 
var(K/^K) = o'k/Mk- Similar arguments establish (i) by using parallel Glivenko- 
Cantelli and Donsker results for the nonparametric bootstrapped empirical 
process. n 

Proof of lemma\lS\ Let ^{x) denote the baseline measure and Pnix), p{x) 
the density function under P„ and P respectively. In the general situation, 
verifying H14() is equivalent to finding a function h such that: 



( dP^{x) 
\ dtJ,{x) 



1/2 



\ dfj.{x) 



\l^ 



\h(x) 



dP(x) 
dfi{x) 



dfi{x) 



W^ 2^^")^(") 



dp{x) 



1 pj^) _ K(^)^!^l_ 

2(p(x))i/2 2 ^ ^(p(x))i/2 



n2 



dp{x) 



\^)iP^-))"" -\^^-)^P^-))"' 



dp{x) 



0. 



Hence the given score function satisfies (|14|) by the smoothness of the log- 
likelihood. D 

Proof of lemma IJM Note that a consequence of the Donsker theorem for 
contiguous alternatives (theorem 3.10.12 of |32]) is that for any bounded 
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P-Donsker class J-, ||P„ — P\\j^ ~^ 0. Thus the proof of lemma IHl can be 
reconstituted to yield that ||Ao||[o,t] is bounded in probability under P„, 
since all of the classes of functions involved are bounded P-Donsker classes. 
We can similarly modify the proof of theorem ^ to yield the desired results 
since, once again, the only classes of functions involved are bounded and 
P-Donsker. This is true, in particular, for the key class given in lemma ITU 

for any k < oo. Thus HV'o — V'olloo ~^ O.D 

Proof of theorem ^ The basic idea of the proof is to use the Donsker 
theorem for contiguous alternatives in combination with key arguments in 
the proof of theorem El and the form of the score and information operators 
under model C2'. Pursuing this course, we obtain for any {hi, /12) G M'^^^, 



{huh'2)Si{C) = V^Fn{l,l) 




^Co,3 ] ^/,J^ f r^22i-i 21/^N / hi 






iro) ( kf ]-vf (c) 



+0^:^1(1) 



where Op (1) denotes a quantity going to zero in probability, under P„, 
uniformly over the set B. Now the Donsker theorem for contiguous al- 
ternatives yields that the right-hand side converges to a tight, Gaussian 
process with covariance P[-ff=K(Ci)-ff*(C2)]> for all C11C2 £ [oi&]i and mean 
P H^ <UJi{il)Q){a^,) + UJ^2i4'o)iv*)\ • Note that we only need to compute 
the moments under the null distribution P. Careful calculations verify that 
this yields the desired results. D 

Proof of corollary\^ The limiting results under P„ follow from theorem [7| 
and the continuous mapping theorem, provided we can show that 

(26) inf v'V^Ov > 0. 

CG[a,''],'"eIR'J+l:j|t,j| = l 

The limiting null distribution results will similarly follow from the fact that 
under the null distribution P, ^'*(C) = for all C S [a, b]. Note that in both 
the null and alternative settings, Vi(C) only depends on the null limiting 
distribution. It is sufficient to verify that <7t/)*,f„ is one-to-one for all sequences 
Cn S [a J ft] and hn £ TCoo- Note that we can ignore any differences between 
Co and C in calculating (" h^ a'^* ^ because of the non-identifiability of C 
under the null hypothesis, ie., C, 1— > a^ ^ is constant. Assume now that there 
exists sequences Cn £ [«,&] and hn £ Woo such that (7^*(^hn — > 0. We 
will now show that this forces /i„ ^ 0. Without loss of generality, we can 
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assume C,n — *■ C* ^'^^ ^n -^ h. Since the map h >-^ cr^*^c_h is continuous and 
since C, i— > (T^*^^/i is cadlag, we can further assume without loss of generahty 
that either aw,* a /i = or that a ,, ^-/i = (the C,~ denotes that we are 
converging to C* from below). The arguments for either case are the same, 
so we will for brevity only give the proof for the first case. 

By the arguments surrounding expressions (PU)) , (PT|) and (|^ , combined 
with the non-identifiability of Q under the null model, we obtain that ex- 
pression ((22) must now hold for all t G (0,r] but with C,^, replacing C,q. In 
ortherwords, Y{t){hil{Y > C*) + h'2Z2{t)l{Y > Q + h'^Z + hi{t)) = 0, 
almost surely, for all t G (0,r]. Since Yai[Z{t4)\Y > C*] > var[Z(t4)|y > 
6] X P [y > &] /P \Y > C*] is positive definite by condition B4, we have h^, = 0. 
We can similarly use B4 to verify that var[Z(t3)|y < Q^] is positive definite 
and thus /12 = 0. Now hi = Q and /i4 = easily follow. Hence h 1-^ a^*^i^h is 
uniformly one-to-one in a manner which yields the conclusion (|2H|1 .D 

Proof of theorem O The results follow from arguments similar to those 
used in the proof of theorem but based on the conditional multiplier 
central limit theorem for contiguous alternatives, theorem El below. D 

Theorem 9. (Conditional multiplier central limit theorem, for con- 
tiguous alternatives) Let T he a P-Donsker class of measurable functions, 
and let Pn satisfy 



I [V^(dPy2 - dpV2) _ i/jJpl/2' 



1/2 

^0, 



asn ^>- oo, for some real valued, measurable function h. Also assume \\Tn.M^oo 
limsup„^^ PnU - Pf?l{\f -Pf\> M} = for all f ^T, and that the 
multipliers in the weighted bootstrap, ki,...,k„, are i.i.d. and independent 
of the data, with mean < fi^^ < oo and variance < a^ < oo, and with 

/o°° ^yP{Ki > u)du < oo. Then {fi^/a^)(F°^ - P„)^G in t°°{F), where G is 
a tight, mean zero Brownian bridge process. 

Proof. The detailed proof can be found in chapter 11 of Kosorok (To 
appear). We now present a synopsis of the proof. Let Ri = a^^^Ki — fi^), 
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i = 1, . . . , 71, and note that 



(27) 



n / \ " 

^n-V2 J2 U^x. - P) + ( ^ - ^ ) n-V2 ;^ ^^(A^^ - P) 



where Ax^ is the Dirac measure of the observation Xj. Since J- is P- 
Donsker, we also have that T = {f — Pf : f € J-'} is P-Donsker. Thus 
by the unconditional multiplier central limit theorem, we have that k ■ 
T is also P-Donsker. Now, by that fact that ||P(/ - P/)||^ = (triv- 
ially) combined with the central limit theorem under contiguous alterna- 
tives, we have that both / h^ rr'^l'^Y^=\'^A^Xi ~ P)f ~^ '^f a-nd / ^^ 
n-^/^Y.7=ii^x, - P)^Gf + P[{f - Pf)h] in ^°^(^). Thus the last two 
terms in (gZI) Sj: Q, and hence V^(;U«/o-^)(P° - P„) Sj G in i°^{J^). This 
now implies the unconditional asymptotic tightness and desired asymptotic 
measurability of ^/n{|IK/o^K){^n ~ ^n)- Fairly standard arguments can now 
be used along with the given pointwise uniform square integrability condi- 
tion to verify that ^/n{fJ,|^/aK){^n ~ ^n) applied to any finite dimensional 
collection /i, • • • , /m £ -^ converges under P„ in distribution, conditional on 
the data, to the appropriate limiting Gaussian process. This now implies 

V^(/x,K)(p°-p„)7G.n 

Proof of corollary Assume at first that M„ is a fixed number M < 
oo. Theorem |H1 now yields that the collection {S^i — Si, . . . , S° - —Si} 

converges jointly, conditionally on the data, to M i.i.d. copies of Z^:. Thus 
Vn converges weakly to the sample covariance process (divided by M„ instead 
of M„ — 1) of an i.i.d. sample of M^ copies of Z^,. The same result holds 
true if we allow M„ to go to oo slowly enough. Since the Gaussian processes 
involved are tight, Vn will thus be consistent for S^,, uniformly over (^ S [a, b]. 
Similar arguments yield pointwise consistency of F and F at continuity points 
of T^< and T^,. Since it is not hard to verify that both T^, and T^< have 
continuous distributions, the pointwise consistency extends to the desired 
uniform consistency. D 
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